Search CORE

7,781 research outputs found

PILER-CR: Fast and accurate identification of CRISPR repeats

Author: C Pourcel
CJ Bult
DH Haft
FJ Mojica
JS Godde
KR Rasmussen
KS Makarova
M Dsouza
R Jansen
R Jansen
RC Edgar
RC Edgar
RC Edgar
Robert C Edgar
RT DeBoy
S Kurtz
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output. RESULTS: We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes. CONCLUSION: PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain. Source code and a Linux binary are freely available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

Author: A Klindworth
A Shade
A Shade
AM Eren
BJ Haas
C Huttenhower
C Lozupone
C Quince
C Quince
DE Hunt
DN Fredricks
EK Costello
EK Costello
H Ochman
JG Caporaso
JG Caporaso
JI Prosser
JJ Faith
JL VandeWalle
JR Brestoff
M Hamady
MGI Langille
Mikhail Tikhonov
MJ Morgan
MJ Rosen
N Fierer
N Kamada
ND Youngblut
Ned S Wingreen
O Lukjancenko
PD Schloss
PD Schloss
PD Schloss
PJ Turnbaugh
RC Edgar
RC Edgar
RC Edgar
Robert W Leach
SJ Song
SM Huse
SP Preheim
TP Tourova
V Kunin
WJ Sul
Y Huang
ZJ Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2014
Field of study

The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina datasets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as 1 nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures + supplement. Significantly revised for clarity, references added, results not change

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

Expedited batch processing and analysis of transposon insertions

Author: A Bohne
AL Price
BC Meyers
CM Bergman
CM Bergman
David A Ray
ES Lander
GD Schuler
JE Stajich
Jeremy D Smith
PL Deininger
RC Edgar
RC Edgar
RH Waterston
Sela
SF Altschul
Publication venue: BioMed Central
Publication date: 01/11/2011
Field of study

Abstract Background With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion. Findings In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats. Conclusions The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p

Crossref

Directory of Open Access Journals

PubMed Central

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

Author: AR Panchenko
B Morgenstern
B Rost
C Chothia
C Kemena
C Notredame
CB Do
Cédric Notredame
D Baker
DG Higgins
DT Jones
Eugene A. Permyakov
F Armougom
G Yona
GH Gonnet
H-N Lin
Hsin-Nan Lin
HY Zhou
HY Zhou
J Skolnick
J Soding
JD Thompson
Jia-Ming Chang
JM Pei
JM Pei
JM Pei
K Katoh
L Rychlewski
L Wang
LA Kelley
MJ Sternberg
MO Dayhoff
O O'Sullivan
P Hogeweg
R Hagopian
R Sadreyev
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
S Henikoff
SF Altschul
SF Altschul
T Hara
T Müller
Ting-Yi Sung
U Roshan
VA Simossis
W Kabsch
W Kabsch
Wen-Lian Hsu
Y Zhang
Y Zhang
Y Zhang
Publication venue: Public Library of Science
Publication date: 02/12/2011
Field of study

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Motif Minang Kaluak Paku Kacang Balimbiang pada Busana Kasual

Minangkabau sebagai salah satu suku bangsa yang mengisi kekhasan budaya Indonesia memiliki warisan budaya yang terpencar dalam berbagai aspek kehidupannya. Salah satu warisan budaya adalah seni ukir. Seni ukir yang dikembangkan dengan mengambil ide dari alam memiliki makna-makna filosofi bagi kehidupan masyarakat Minangkabau. Semua jenis ukiran yang dipahatkan di Rumah Gadang menunjukkan unsur penting pembentuk budaya Minangkabau bercerminkan kepada apa yang ada di alam. Salah satu ukiran pada rumah gadang yaitu kaluak paku. Kaluak paku adalah nama salah satu motif ukiran dalam adat Minangkabau. Berasal dari motif gulungan (kelukan/kaluak) pada ujung tanaman pakis (paku) yang masih muda. Ukiran kaluak paku rumah gadang melambangkan tanggung jawab seorang lelaki dalam adat Minangkabau kepada generasi penerus, sebagai ayah dari anak-anaknya dan sebagai mamak dari kemenakan (keponakan). Ukiran rumah gadang kaluak paku minangkabau inilah yang menjadi sumber ide penciptaan busana pada tugas akhir ini. Pada Penciptaan karya ini menggunakan beberapa metode, yaitu metode pendekatan estetis dan ergonomis, metode pengumpulan data dengan studi pustaka, dan motode penciptaan dengan teori Gustami Sp 3 tahap 6 Langkah. Dalam proses pembuatan karya dibutuhkan beberapa data, cara pengumpulan data acuan berdasarkan pengumpulan data pustaka yaitu berupa buku, jurnal pada media sosial, serta aplikasi pada smartphone seperti pinterest. Data yang dikumpulkan yang paling utama adalah gambar bentuk visual dari ukiran tanaman kaluak paku minangkabau dan busana kasual. Penciptaan karya yang dihasilkan yaitu berupa 8 busana kasual. Siluet pada kesuluruhan hasil karya yaitu memiliki siluet A yang mengembang pada bagian bawah. Pada penciptaan karya ini menggunakan bahan utama primisima. Perpaduan warna yang diterapkan menggunakan warna khas minangkabau yang diambil dari warna bendera adatnya “marawa” yaitu merah, hitam, dan kuning. Karya- karya yang dihasilkan dengan penggunaan warna tersebut sangat sesuai dengan tema yang mengangkat ukiran rumah gadang kaluak paku minangkabau. Kata Kunci : Minang, Kaluak Paku Kacang Balimbiang, Kasua

Crossref

Directory of Open Access Journals

University of Tasmania Open Access Repository

Western Sydney ResearchDirect

Indonesian Institute of the Art Yogyakarta

FigShare

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

Risk of Cerebrovascular Events in 178 962 Five-Year Survivors of Cancer Diagnosed at 15 to 39 Years of Age: The TYACSS (Teenage and Young Adult Cancer Survivor Study)

Author: Bright CJ
Cutter DJ
Edgar AB
Feltbower RG
Frobisher C
Guha J
Hall M
Hawkins MM
Henson KE
Kelly JS
Reulen RC
Winter DL
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/2017
Field of study

Background: Survivors of teenage and young adult (TYA) cancer are at risk of cerebrovascular events, but the magnitude of and extent to which this risk varies by cancer type, decade of diagnosis, age at diagnosis and attained age remains uncertain. This is the largest ever cohort study to evaluate the risks of hospitalisation for a cerebrovascular event among long-term survivors of TYA cancer. Methods:The population-based Teenage and Young Adult Cancer Survivor Study (N=178,962) was linked to Hospital Episode Statistics data for England to investigate the risks of hospitalisation for a cerebrovascular event among 5-year survivors of cancer diagnosed when aged 15-39 years. Observed numbers of first hospitalisations for cerebrovascular events were compared to that expected from the general population using standardised hospitalisation ratios (SHR) and absolute excess risks (AER) per 10,000 person-years. Cumulative incidence was calculated with death considered a competing risk. Results: Overall, 2,782 cancer survivors were hospitalised for a cerebrovascular event—40% higher than expected (SHR=1.4, 95% confidence interval [CI]=1.3-1.4). Survivors of central nervous system (CNS) tumours (SHR=4.6, CI=4.3-5.0), head & neck tumours (SHR=2.6, CI=2.2-3.1) and leukaemia (SHR=2.5, CI=1.9-3.1) were at greatest risk. Males had a significantly higher AER than females (AER=7 versus 3), especially among head & neck tumour survivors (AER=30 versus 11). By age 60, 9%, 6% and 5% of CNS tumour, head & neck tumour, and leukaemia survivors, respectively, had been hospitalised for a cerebrovascular event. Beyond age 60, every year 0.4% of CNS tumour survivors were hospitalised for a cerebral infarction (versus 0.1% expected. Whereas at any age, every year 0.2% of head & neck tumour survivors were hospitalised for a cerebral infarction 7 (versus 0.06% expected). Conclusions: Survivors of a CNS tumour, head & neck tumour, and leukaemia are particularly at risk of hospitalisation for a cerebrovascular event. The excess risk of cerebral infarction among CNS tumour survivors increases with attained age. For head & neck tumour survivors this excess risk remains high across all ages. These groups of survivors, and in particular males, should be considered for surveillance of cerebrovascular risk factors and potential pharmacological interventions for cerebral infarction prevention

Crossref

University of Birmingham Research Portal

Oxford University Research Archive

White Rose Research Online

Optimizing substitution matrix choice and gap parameters for sequence alignment

Author: CB Do
CB Do
CN Dewey
D Gusfield
DT Jones
E Kim
G Blackshields
GA Price
GH Gonnet
I Van Walle
J Flannick
J Kececioglu
J Pei
JD Thompson
JD Thompson
JG Henikoff
K Katoh
M Box
MA Larkin
MO Dayhoff
MP Styczynski
MS Waterman
O Chapelle
RC Edgar
RC Edgar
Robert C Edgar
S Henikoff
T Lassmann
T Muller
T Muller
TM Phuong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments. Results POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB. Conclusion The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central